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Abstract 

This paper deals with the problem of the multivariate copula density estimation. 
Using wavelet methods we provide two shrinkage procedures based on thresholding 
rules for which the knowledge of the regularity of the copula density to be estimated 
is not necessary. These methods, said to be adaptive, are proved to perform very 
well when adopting the minimax and the maxiset approaches. Moreover we show 
that these procedures can be discriminated in the maxiset sense. We produce an 
estimation algorithm whose qualities are evaluated thanks some simulation. Last, 
we propose a real life application for financial data. 
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maxiset theory. 

AMS Subject Classification: 62G10, 62G20, 62G30. 



1 Introduction 



Recently a new tool has appeared (principally for risk management, for in- 
stance in finance, insurance, climatology, hydrology etc) to model the struc- 
ture of dependance of the data. Let us start by recalling the seminal result of 
Sklar [n]. 

Theorem 1 (Sklar (1959)) Let d > 2 and H be a d—variate distribution 
function. If, for m = 1, . . .d, any margin Fm of H is continuous, there exists 
a unique d—variate function C with uniform margins W[o,i] such that 



The repartition function C is called the copula associated to the distribution 
H. The interest of the Sklar Theorem is the following: it is possible to study 
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separately the laws of the coordinates X"^, m = 1, . . . c?, of any vector X whose 
the law is H and the dependance between the coordinates. 

The Copulas model has been extensively studied in a parametrical frame- 
work for the distribution function C. Large classes of Copulas, such as the 
elliptic family, which contains the Gaussian Copulas and the Student Copula, 
and the archimedian family, which contains the Gumbel Copula, the Clay- 
ton Copula and the Frank Copulas, have been identified. Mainly, people have 
worked in two directions. Firstly, an important activity has concerned the 
modeling in view to find new Copulas and methodologies to simulate data 
coming from these new Copulas. Secondly, usual statistical inference (esti- 
mation of the parameters, goodness-of-fit test, etc) has been developed using 
the copulas. As usual, the nonparametric point of view is useful when no a 
priori model of the phenomenon is specified. For the practitioners, the non- 
parametrical estimators could be seen as a benchmark allowing to specify 
the model, comparing with the available parametrical families. This explains 
the success of the nonparametric estimator of the copula. Unfortunately, the 
practitioners have difficulties to analyze the graphes concerning the distribu- 
tion functions (when d = 2). Generally, they try to make comments observing 
the scatter plot of {{Xi,Yi),i = 1,...,?t,} (or {{Ri,Si),i = l,...,n} where 
R, S are the rank statistics of X, Y). Therefore a density estimator could be 
very advantageous. 

We propose procedures to estimate the density associated to the copula C. 
This density denoted c is called the copula density. The copula density model 
that we consider is the following. Let us give a n-sample {Xl, . . . ,Xf), . . . , 
{X^,...,X^) of independent data admitting the same distribution H (and 
the same density h) as {X\ . . . , X'^). We denote Fi, . . . , the margins of the 
coordinates of the vector (X^, . . . We are interested in the estimation 

of the density copula c which is defined as the derivative (if it exists) of the 
copula distribution 

h{F^\u{),...,F^\u,)) 

c(Mi, . . . ,Md) - 



h{F{\u,)) . . . UF^\u,)) 

where F~^{up) = inf{x G R : Fp{x) > Up}, 1 < p < d and u = (ui, . . . , Ud) G 
[0, 1]^. This model is a very classical density model but the direct observations 
(f7.i = Fi(X/), ...,Uf = Fd{Xf)) for z = 1, . . . , n of the copula C are not 
available, because the margins are generally unknown. Observe that a similar 
problem could be the nonparametric regression model with random design. 
This model has been studied in Kerkyacharian and Picard [15] using warped 
wavelet families. 

In this paper, we focus on wavelet methods. We emphasize that these methods 
are very well appropriated because the localization properties of the wavelet 
allow us to give a sharp analysis of the behavior of the copula density near the 
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border of [0, l]*^. A great advantage of the wavelet methods in statistics is to 
provide adaptive procedures in the sense that they automatically adapt to the 
regularity of the object to be estimated. The problem is a little bit different in 
the case of the copula. So far, the practitioners use copula densities which are 
regular. Nevertheless, one of the practical problem is due to the fact that the 
copula densities are generally concentrated towards the borders of [0, 1]*^ (see 
Part 6 where simulations are presented). It is very difficult to provide a good 
estimation when the considered density admits significant jumps (which is the 
case of the copula density when its support is extended). Observe that the lack 
of data behind the jumps produce poor estimators and this is just where we 
are interested by the estimation. The thresholding methods use multifrequency 
levels depending of the place where the estimator is computed. Consequently, 
they are really fitted for the problem to estimate the copula density and we 
think that they outperform the classical linear methods like kernel estimators. 

We present in this paper the Local Thresholding Method and the Global 
(or Block) Thresholding Method. These methods has been first studied 
by Donoho et al. (^, [7j) and Kerkyacharian et al. |T3]. We prove that the 
theoretical properties of these procedures with respect to the quadratic risk 
are as good in the copula density model as in the usual density model (when 
direct data are available). The good behavior of the usual procedures in the 
copula density model is also observed when linear procedures are considered 
(see Genest et al. [9]) and for test problems (see Gayraud and Tribouley [8]). 
To give an entire overview on the minimax properties of the copula density 
estimators, we explore the maxiset properties of both procedures (and also of 
the linear procedure). Again, we prove that the fact that no direct observa- 
tions are available does not affect the properties of the wavelet procedures: as 
in the standard density model, the local thresholding method of estimation 
outperforms the others one. 

Next, we provide an important practical study. Another advantage of the 
wavelet procedures is their remarkable facility of use. Indeed, a large number 
of implementation of the fast wavelet transform exist and it is just a matter 
of choice to select one in a given programming language. They can be used 
almost directly after a simple warping of the original samples. Nevertheless, 
as noticed previously in the case of the copula density, as most of the per- 
tinent information is located near the border of [0, 1]'^, the handling of the 
boundaries should be done with a special care. It is one of the result of the 
article to show that an inappropriate handling, such as the one proposed by 
default by a lot of implementation, could cause the wavelet method to fail. The 
symmetrization/periodization process proposed in this paper is described. To 
reduce the gridding effect of the wavelet, we further propose to incorporate 
some limited translation invariance in the method. We first comment our re- 
sults for simulated data in the case of the usual parametrical copula families 
and then we present an application on financial data: we propose a method 
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to estimate the copula density imposing that the "true" copula belongs to 
a target parametrical family and using the nonparametrical estimator as a 
benchmark. 

The paper is organized as follows. Section 2 deals with the wavelet setting 
so as to describe the multidimensional wavelet basis used in the sequel. Sec- 
tion 3 aims at describing both thresholding procedures of estimation for which 
performances shall be studied in Section 4 with the minimax approach and 
in Section 5 with the maxiset approach. Section 6 is devoted to the practi- 
cal results. Proofs of main theorems are given in Section 7 and shall need 
proposition and technical lemmas proved in Appendix. 



2 Wavelet setting 

Let and ip be respectively a scaling function and an associated wavelet 
function. We assume that these functions are compactly supported on [0, L] for 
some L > 1. See for example the Daubechies's wavelets (see Daubechies [5]). 
For any univariate function h{-), we denote by hj^k{-) the function 2^^'^h{2^ ■ 
—k) where j G N and /c G Z. In the sequel, we use wavelet expansions for 
multivariate functions. We build a multivariate wavelet basis as follows: 

d 

ii^l^ixu . . . = n &rMrj:kjxm), 

m=l 

for all e = (ei, . . . , ed) E Sd = {0, 1}°'\ {(0, . . . , 0)}. We keep the same notation 
for univariate scaling function and multivariate scaling function; the subscripts 
k = {ki, . . . , kd) indicate the number of components. For any jo ^ the set 
{(pjoky'ipj^k \j — jo^k G iZ-.e G Sd} is an orthonormal basis of L2(M'^) and the 
expansion of any real function h of L2(M'^) is given by: 

G h{x) = Y: h,,,k<i^,,,k{x) + E E E hlk^ki^), 
k&'i j>jo fcez-i eeSd 

where, for any (j, /c, e) G N x Z'^ x Sd, the wavelet coefficients are 

hjok= I h{x)(t)jQk{x)dx and h''- ^ = h{x)ip': ^{x)dx. 

Roughly speaking, the expansion of the analyzed function on the wavelet basis 
splits into the "trend" at the level jo and the sum of the "details" for all the 
larger levels j, j > jo. 



4 



3 Estimation procedures 



Assuming that the copula density c belongs to L2, we present wavelet proce- 
dures of estimation. Motivated by the wavelet expansion, we first estimate the 
coefficients of the copula density on the wavelet basis. Observe that, for any 
d— variate function $ 



E,(<l>(f/i, ...,Ud))= Eh ($ (Fi(Xi), . . . , F.iX"))) 

which means that the wavelet coefficients Cj^^k, Cj i. of the copula density c 
on the wavelet basis {0jo,fc' '^j,k \j — io? k G Z'', e G 5*^} are equal to the 
coefficients of the joint density h on the warped wavelet family 

. . . , F,{-)),rj,km-), ■ ■ ■ , Fdi-)) \j > Jo, k G e G S,}. 
As usual, standard empirical coefficients are 



-'jo,k 



1 " 



{F,iXl),...,F,{Xf)) 



and 



1 " 

ch=-EnkiFiiXl),...,F,{Xt)). (1) 

i=l 

Since no direct observation . . . , Fd{Xf)) is usually available, we pro- 

pose to replace this one with the pseudo observation {Fi{Xl), . . . , Fd{Xf)) 
where Fi,. . .F^ are the empirical distribution functions associated with the 
margins. The empirical coefficients are 



^^k = -T. KAFiixli . . . , uxf)) = - E Km -T' • • • ' V 

and 



in in/ d1 

<^lk = -T.n,iF,{xi), . . . , F,[xf)) = -Y. 

n ~^ n ~[ \ n n J 

where i?f denotes the rank of Xf for p = 1, . . . , c? 

n 



1=1 
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According to the fact that the wavelet basis is compactly supported, the sum 
over the indices k, e is finite and is taken over 2'^L'^2'^^ terms. In the sequel, to 
simplify the notations, we omit the bounds of variation of the indices k, e. 

The most natural way to estimate the density c is to reconstruct the density 
thanks to the estimated coefficients. For any indices J„) such that j„ < J„, 
we consider the very general family of truncated estimators of c defined by 

J n 

Or := 5?(j„, Jn) = ^,k<p3n,k ^i.fcCj- fcV'i.fc, (2) 

k j=jn k,t 

where cu] ;. G {0, 1} for any (j, /c, e). It is intuitive that the more regular is the 
function c, the smallest are the details and then that a good approximation for 
the function c is the trend at a level j„ large enough. Such linear procedures 

Cl := ClUn) = ^k(l)j,,,k (3) 

k 

consisting to put Uj^f. = for any (j, k, e) have been considered in Genest et al. 
[9] where a discussion on the choice of jn is done. It is also possible to consider 
other truncated procedures as non linear procedures: the local thresholding 
procedure consists to "kill" individually the small estimated coefficients con- 
sidering that they do not give any information. Let A„ > be a thresholding 
level and {jn, Jn) be the level indices. For e G S'd and j varying between jn 
and Jn, the hard local threshold estimators of the wavelet coefficients are 

dik = ^^kH\^J.\ >^n}, 

leading to the hard local threshold procedure 



ChL ■■= ChlUu, Jn) = Y ^3n,k(pjr.,k ^i.fc^i.fc" (4) 

k j=jn k,e 

It is also possible to decide to "kill" all the coefficients of the level j if the 
information given by this level is too small. For a thresholding level A„ > 
and for levels j varying between jn and Jn, we define the hard global threshold 
estimates of the wavelet coefficients as follows 

k 

leading to the hard global threshold procedure 
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Jn 

k j=jn k,e 

The non linear procedures given in (jl]) and ([5]) depend on the levels indices 
{jn, Jn) and on the thresholding level A„ to be chosen by the user. In the 
next part, we explain how to determine these parameters such a way that the 
associated procedures achieve optimality properties. We need then to define a 
criterion to measure the performance of our procedure. 



4 Minimax Results 

4-1 Minimax approach 

A well-known way to analyze the performances of procedures of estimation is 
the minimax theory which has been extensively developed since the 1980- 
ies. In the minimax setting, the practitioner chooses a loss function i{.) to 
measure the loss due to the studied procedure and a functional space where 
the unknown object to estimated is supposed to belong. The choice of this 
space is important because the first step (when the minimax theory is applied) 
consists to compute the minimax risk associated to this functional space. 
This quantity 

Rn{J^) = inf sup E i(c — c) 

(where the infimum is taken over all the estimators of c) is a lower bound 
giving the best rate achievable on the space J^. When a statistician proposes 
an estimation procedure for functions belonging to J-', he has to evaluate the 
risk of his procedure and compare this upper bound with the minimax risk. 
If the rates coincide, the procedure is minimax optimal on the space J-". 
A lot of minimax results for many statistical models and many families of 
functional spaces as Sobolev spaces. Holder spaces, and others as the family 
of Besov spaces have been now established (see for instance Ibragimov and 
Khasminski [11] or Kerkyacharian and Picard |12j). 

4-2 Besov spaces 

Since we deal with wavelet methods, it is very natural to consider Besov 
spaces as functional spaces because they are characterized in term of wavelet 
coefficients as follows 

Definition 1 (Strong Besov spaces) For any s > 0, a function c belongs 
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to the Besov space 82^0 if o.f^d only if its sequence of wavelet coefficients Cj ^. 
satisfies 

■^-'^ j>J k,€ 

An advantage of the Besov spaces B200 is to provide a useful tool to classify 
wavelet decomposed signals according to their regularity and sparsity proper- 
ties (see for instance Donoho and Johnstone [6]). Last, it is well known that 
the minimax risk measured with the quadratic loss on this space is 

supinf sup n2s+<i E'Hc — c||2 < 00 

n c c&B'^^ 

where the infimum is taken other any estimator of the density c. 
4-3 Optimality 

Let us focus on the quadratic loss function. Choosing a wavelet regular enough 
and when d = 2, Genest et al. [9] prove that the linear procedure cl = ci{in) 
defined in ([3]) provides an optimal estimator on the Besov space i?2oo f*^^ some 
fixed s > as soon as j„ is chosen as follows: 

On a practical point of view, this result is not completely satisfying because 
the optimal procedure depends on the regularity s of the density which is 
generally unknown. To avoid this drawback, many works in the nonparametric 
setting as in Cohen et al. [12], in Kerkyacharian and Picard [Hj inspired from 
Donoho and Johnstone's studies on shrinkage procedures (see for instance 
[S]) build adaptive procedures which means that they do not depend on 
some a priori information about the unknown density. For instance, note that 
the thresholding procedures described in (jlj) and ([5]) are clearly adaptive. 
The following theorem (which proof is a direct consequence of Theorem H] 
established in the next section by considering some inclusion spaces properties) 
gives results on their rates 

Theorem 2 Let us consider a continuously differentiahle wavelet function 
and let s > 0. Let us choose the integers jn and Jn and the real A„ such 
that 

2^"-i < {login)^ < 2>^ 2^"-i < ( A-Y' < 2^", A„ = J 

\logn J V n 

for some a > 0. Let c be either the hard local thresholding procedure chlUu, Jn) 
or the hard global thresholding procedure chgUu, Jn) ■ Then, as soon as n is 
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large enough, we get 

c e n Loo ^ sup ( y^^^ j E\\Z-c\\l<oo. 

We immediately deduce 

Corollary 4.1 The hard local thresholding procedure Chl O'nd the hard global 
thresholding procedure Chg (^i"^ adaptive near optimal up to a logarithmic term 
(that is the price to pay for adaptation) on the Besov spaces 82^0 for the 
quadratic loss function, according to the minimax point of view. 

Observe that if s > d/2 then the assumption c G 82^0 H -^00 in Theorem [2] 
could be replaced by c G because in this particular case Sjo^ $ -^00 • 

4.4 Criticism on the minimax point of view 

On a practical point of view, the first drawback of the minimax theory is the 
necessity to fix a functional space. Corollary 14. 1 1 establishes that no procedure 
could be better on the space B200 than our hard thresholding procedures (up 
to the logarithm factor) but this space is generally an abstraction for the 
practitioner. Moreover, this one knows that his optimal procedure achieves 
the best rate but he does not know this rate. An answer to this drawback is 
given by Lepski by introducing the concept of the random normalized factor 
(see Hoffmann and Lepski |10j). 

Secondly, he has the choice between both procedures since Theorem [2] estab- 
lishes that the hard thresholding procedures (local and global) have the same 
performances when dealing with the minimax point of view. Nevertheless, a 
natural question arises here: is it possible to compare thresholding procedures 
for estimation of the copula density between themselves? To answer to these 
remarks, we propose to explore in the next part the maxiset approach. 



5 Maxiset Results 

5.1 Maxiset approach 

The maxiset point of view has been developed by Cohen et al. [1] and is 
inspired from recent studies in the approximation theory field. This approach 
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aims at providing a new way to analyze the performances of the estimation 
procedures. Contrary to the minimax setting, the maxiset approach consists 
in finding the maximal space of functions (called the maxiset) for which a 
given procedure attains a prescribed rate of convergence. According to this, 
the maxiset setting is not so far from the minimax one. Nevertheless it seems 
to be more optimistic than the minimax point of view in the sense that the 
maxiset approach points out all the functions well-estimated by a fixed 
procedure at a given accuracy. 

In the sequel, we say that a functional space, namely AiS{c, r„), is the maxiset 
of a fixed estimation procedure c associated with the rate of convergence r„ if 
and only if the following equivalence is satisfied 

sup E\\c — c\\l < oo <^=^ cEA4S(c,rn). 

n 

Notice that the considered loss function is again the quadratic one. As a first 
consequence of adopting the maxiset point of view, we observe that if c is an 
estimator of c achieving the minimax rate of convergence f„ on a functional 
space V, then V is included in the maxiset of c associated with the rate Vn 
but is not necessarily the same. Therefore, it could be possible to distinguish 
between both optimal minimax procedures: for the same target rate, the best 
procedure is the procedure admitting the largest maxiset. 

Recently many papers based on the maxiset approach have arisen when con- 
sidering the white noise model. For instance it has been proved in Autin et al. 
[2] that the hard local thresholding procedure is the best one, in the maxiset 
sense, among a large family of shrinkage procedures, called the elitist rules, 
composed of all the wavelet procedures only using in their construction empir- 
ical wavelet coefficients with absolute value larger than a prescribed quantity. 
This optimality has been already pointed out by Autin [T] in density estima- 
tion who has proved that weak Besov spaces are the saturation spaces of 
elitist procedures. 

5.2 Weak Besov spaces 

To model the sparsity property of functions, a very convenient and natural 
tool consists in introducing the following particular class of Lorentz spaces, 
namely weak Besov spaces, that are in addition directly connected to the 
estimation procedures considered in this paper. We give definitions of the 
weak Besov spaces depending on the wavelet basis. However, as is established 
in Meyer [16] and Cohen et al. [1], most of them have also different definitions 
proving that this dependence in the basis is not crucial at all. 
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Definition 2 (Local weak Besov spaces) For any < r < 2, a function 
c G i^2([0, l]'^) belongs to the local weak Besov space W^(r) if and only if its 
sequence of wavelet coefficients Cj^. satisfies the following equivalent properties: 

. sup y-'T.T.ichrmcU<>^}<oo, 

0<A<1 k,€ 

• sup A'^^^l{|4,.| > A} < oo. 

0<A<1 ^•>o k,e 

Definition 3 (Global weak Besov spaces) For any < r < 2, a function 
c G i^2([0, l]'') belongs to the global weak Besov space yV^ir) if and only if its 
sequence of wavelet coefficients Cj ^ satisfies the following equivalent properties: 

• sup r-2EE(4fc)'i{E(4fc)'<2'^'A'}<^> 

0<A<1 k,€ k 

. sup 2'^ nY^clk? > 2'^-' A^ < oo. 

0<X<1 j>o e k 

The equivalences between the properties used in the definition of global weak 
Besov spaces and in the definition of local weak Besov spaces can be proved 
using same technics as those proposed in Cohen et al. [4]. 

We prove in Section 17.31 the following link between the global weak Besov 
space and the local weak Besov space 

Proposition 1 For any < r < 2, we get Weir) C VVL(r). 

Last, we give an upper bound for the estimation expected error (when the 
thresholding procedures are used) in the standard density model (i.e. when 
direct observations {Ul , . . . ,Uf),i = l,...n are available). Notice that this 
result is stronger than the result given in the minimax part because the func- 
tional assumption is weaker. 

Theorem 3 Let • be either L or G and ch, be the estimator ch, built in the 
same way as ch, but with the sequence of coefficients Cj f, defined in (Q]). Let 
s > and assume that 
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Then, there exists some K > such that for any n 



E\\ch, 



<K 



'login) 



n 



There is no difficulty to prove this result by using the same technics as in 
Cohen et al. [1] where the proof is done for d = 1 for classical thresholding 
rules in general models. Nevertheless, for the interested reader, a detailed proof 
can be found in Autin et al. [3]. 



5.3 Performances and comparison of our procedures 



In this section, we study the maxiset performances of the linear procedure and 
of the thresholding procedures described in Section 1. We focus on the optimal 
minimax procedures which means that we consider the following choices of 
parameters: 



2^" — <2^", Xn = \ for some K > 0, 

\log(n) / V n 



i/d 

2^"-! < (log(n))'/'^ < 2^"-i < ( ) < 2^" 



n 

log{n 

Let us fix s > 0. we choose to focus on the (near) minimax rate r„ = 
log(n))^'''''^^'^''^'^'' achieved on the space -8200- The following theorem ex- 
hibits the maxisets of the three procedures associated with the same target 
rate r„. 

Theorem 4 Let s > 0, and assume that c G L^o- For a large choice of k, we 
have 



2a 

n \ Zf + d 2 

sup I I F^W'^L — c\\2 < 00 



log(n) 



23 

n \2s+d 

sup I N ) E\\chl — c\\2 < 00 



log(n) 



23 

n \ 2s+d . 2 

sup I TTTTTT ) E\\CHG - ch < 00 



n \log{n) 



cG^loo, (6) 

>c.^2#nw.(^),(7) 

.cG^2#^n>Vc(^).(8) 
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It is important to notice that, according to Theorem [3l the fact that direct 
observations are not available does not affect the maxiset performances of our 
procedures. As remarked by Autin et al. [5] we have clearly 



c Br^'nWci 



ds 



2d 



2oo 



2s + d 



We deduce by using Proposition [T] that both thresholding estimators consid- 
ered in Theorem [2] achieve the minimax rate (up to the logarithmic term) on 
a larger functional space than B200 which is the required space in the minimax 
approach. In particular, Theorem [2] is proved. We propose now to discriminate 
these procedures by comparing their maxisets. Thanks to Theorem H] and ap- 
plying the inclusion property given in Proposition [H we prove in Section 17.31 
the following corollary 

Corollary 5.1 Let s > and let us consider the target rate 



Hence, in the maxiset point of view and when the quadratic loss is considered, 
the thresholding rules outperform the linear procedure. Moreover, the hard lo- 
cal thresholding estimator chl appears to be the best estimator among the 
considered procedures since it strictly outperforms the hard global thresholding 
estimator Chg- 



6 Applied results 

This part is not only an illustration of the theoretical part. First, we explain 
the considered algorithm with several numerical possibilities to overcome any 
drawback. Next, we test the qualities of our methodology with some simulation 
and we define the best choices among our propositions. Last, we apply the 
chosen procedure for financial data. 

6.1 Algorithms 

The estimation algorithms are described here for d = 2 for sake of simplicity 
but their extension in any other dimension is straightforward. We therefore 



2s 




We get 



MS{cL,rn) MS{cHG,rn) MS{chl, rn) ■ 
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assume that a sequence {(Xj, Fj)}i<j<„ of n samples is given. 

All estimators proposed in this paper can be summarized in a 7 steps algo- 
rithm: 

(1) Rank the Xj, Yi with 

n n 

Ri = Y. l^KX. and Si = Y, lyKV-r 
1=1 1=1 

(2) Compute the maximal scale index J„ = Li loS2(k;|^)J • 

(3) Compute the empirical scaling coefficients at the maximal scale index J„: 

1 R - S 

c^^k, = - E — , -) for 1 < A;i < 2'- and l<k2<2'-. 

n . n n 

l=L 

(4) Compute the empirical wavelet coefficients from this scaling coef- 
ficients with the fast 2D wavelet transform algorithm. 

(5) Threshold these coefficients according to the global thresholding rule or 
the local thresholding rule to obtained the estimated wavelet coefficients 

(6) Compute the estimated scaling coefficients cj^^kiM scale index J„ by 
the fast 2D wavelet inverse transform algorithm. 

(7) Construct the estimated copula density c by the formula 

ki,k2 

Unfortunately only the steps (1), (2) and (5) are as straightforward as they 
seem. In all the other steps, one has to tackle with two issues: the handling 
of the boundaries and the fact that the results are not a function but a finite 
matrix of values. 

The later issue is probably the easiest to solve. It means that we should fix 
a number of point N larger than 2"^" and approximate the estimated copula 
density at step (7) on the induced grid {i/N, j/N). We can not compute exactly 
the value on the grid as the scaling function is not always known explicitly. 
Nevertheless a very good approximation can be computed on this grid and 
we neglect the effect of this approximation. From the numerical point of view, 
this implies that the estimation error can be computed only on this grid and 
thus that the norms appearing in the numerical results (see Table [1], Table 
m Table S] and Table [3]) are empirical norms on this grid. In this paper, we 
choose N = 4*2"^". Note that step (3) also require an evaluation of the scaling 
function and thus is replaced by an approximation. 

The former issue, the boundaries effect, is the key issue here. Indeed, for most 
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copula densities, the interesting behavior arises in the corners which are the 
most difficult part to handle numerically. The classical construction of the 
wavelet yields a basis over while we only have samples on [0, l]"^. 

• A first choice is to consider the function of [0, l]'^ to be estimated as a func- 
tion of which is null outside [0, l]*^. This choice is called zero padding 
as it impose the value outside [0, l]'^. 

• A second choice is to suppose that we observe the restriction on [0, 1]°' of a 1- 
periodic function, this is equivalent to work in the classical periodic wavelet 
setting. This choice called periodization is very efficient if the function is 
really periodic. 

• We propose also to modify the periodization and assume that we observe 
the restriction over [0, l]'^ of a even 2-periodic function. As we introduce a 
symmetrization over the existing borders, we call this method symmetriza- 
tion. It avoids the introduction of discontinuities along the border. Note 
that nevertheless this symmetrization introduces a discontinuities in the 
derivatives on the boundaries. 

Once this extension has been performed on the sample, one can apply the 
classical wavelet transform. The resulting estimated copula density is the re- 
striction to [0, l]'^ of the estimated function. 

The wavelet thresholding methods in a basis suffer from a gridding effect: we 
can often recognize a single wavelet in the estimated signal. To overcome this 
effect, we propose to use the cycle spinning trick proposed by Donoho and 
Johnstone. To add some translation invariance to the estimation process, we 
estimate the copula density with a collection of basis obtained by a simple 
translation from a single one and to average the resulting estimate. In our 
numerical experiments, we have performed this operation with 25 different 
translation parameters and observed that it has always ameliorate our esti- 
mate. 

6. 2 Simulation 

We focus on the usual parametrical families of copulas: the FGM, the Gaus- 
sian, the Student, the Clayton, the Frank and the Gumbel families. We give 
results for both values of n (the number of data): n = 500 is very small for 
bidimensional problems and n = 2000 is usual in nonparametric estimation. 

We test both methods of estimation and three ways to solve the boundaries 
problems. We simulated data with the same copula, the ffist margin being 
exponential with parameter 4 and the second margin being standard gaussian. 
Obviously, because of our algorithm, the results are exactly the same when we 
change the laws of the margins. 
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To evaluate the quality of our results, we consider three empirical loss functions 
deriving from the Li norm, L2 norm and L^o norm: 

Eg = \\c- Co\\N,q forg=l,2,cx) 

where cq is the "true" copula density and N x N is the number of points of the 
grid (see the previous part). Table H] and Table [3] summarize the estimation 
relative errors _ 

RE'i= -^"ll^-" forg = l,2,cx). 

These relative errors are computed with 100 repetitions of the experiment. 
The associated standard deviation is also given (in parentheses). 

Table [1] and Table [2] show that the zero-padding method and the periodiza- 
tion method give similar results and lead to errors which are generally much 
larger than the symmetric periodization which is the best method to solve the 
boundaries effects. This remark is valid for n = 500, 2000. Although the zero- 
padding method is the default method with the Matlab Wavelet Toolbox, it 
suffers from a severe drawback: it introduces a strong discontinuities along the 
borders of [0, l]*^. The periodization method suffers from the same drawback 
than the zero-padding method as soon as the function is not really periodic. 
Figure [T] emphasizes the superiority of the symmetric periodization method 
in the case where the unknown copula density is a normal copula. While the 
copula estimated with symmetric extension remains close from the shape of 
the true copula except for the resolution issue, this is not the case for the two 
other estimated copulas: in the periodized version, the height of the extreme 
peaks is reduced and two spurious peaks corresponding to the periodization of 
the real peaks appear. The zero padded version shows here only the reduced 
height artifact. 

Tables [3] and H] display the empirical Li, L2 and L^o estimation error for 
the symmetric extension for respectively n = 500 and n = 2000. Globally, 
they show that the best results are obtained for the L2 norm for which the 
method has been designed. The second best results are obtained for the Li 
norm because a bound on the L2 norm implies a bound on the Li norm. The 
estimation in Loo is much harder as it is not a consequence of the estimation 
in L2 and can be considered as a challenge for such a method. 

One can also observed that the behavior largely depends on the copula itself. 
This is coherent with the theory that states that the more "regular" the copula 
is the more efficient the estimator will be. The copulas that are the least well 
estimated (Normal with parameter .9, Student with parameter .5 and Gumbel 
with parameter 8.33) are the most "irregular" ones. They are very "peaky" 
for the last two and almost singular along the first axis for the first one. They 
are therefore not enough regular to be estimated correctly by the proposed 
method. 
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A final remark should be given on the difficulty to evaluate such these errors. 
Whereas the Li norm is finite equal to 1 for all true copula, the L2 and 
norm can be very large even infinite because of their peaks. This is not an 
issue from the numerical point of view as we are restricted to a grid of step 
on which one can ensure the finiteness of the copula. Nevertheless the 
induce "empirical" norm can be substantially different from the integrated 
norm. Thus the error for n = 500 to n = 2000 are not strictly equivalent as 
the function can be much more complex for the resolution induced by n = 2000 
than for n = 500. 



6.3 Real data applications 



We apply the thresholding methods on various financial series to identify the 
behavior of the dependance (or non dependance). All our data correspond 
to daily closing market quotations and are from 01/07/1987 to 31/01/2007. 
As usual, we consider the log —return of the data. Note that the data of 
each samples are not necessary independent but we apply our procedures 
as there were. We first propose estimators of the bivariate copula density 
associated with two financial series using the adaptive thresholding procedures 
(see Figure[2]- Figure[5]). Next, the nonparametrical estimator denoted c is used 
as a benchmark and we derive an new estimator by minimization of the error 
between the benchmark and a given parametrical family of copula densities. 
As previously, we focus on copulas which belong on the Gaussian, Student, 
Gumbel, Clayton or Frank families. More precisely, we consider the following 
parametric classes of copulas 





= {c 




e = 


■ [-0.99 : 0.01 : 0.99]} 


C2 


= {c 




9 = 


[-0.99 : 0.01 : 0.99,1 : 1 : 100]} 


C3 


= {c 




e = 


[1 : 0.01 : 2]} 


C4 


= {c 


eCe, 


e = 


[0 : 0.01 : 2]} 


C5 


= {c 




e = 


[-2 : 0.01 : 2]} 



We consider three distances 



Eg{9,p) = \\c- C0\\N,g for g = l,2, cx) 

where cg G Cp,p = 1, ... 5. We propose to estimate the parameter 9 for each 
class Cp of copula densities as follows 

= arg min Eq{9,p) 
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which appears to be the best estimator of 9 under the constraint that the 
copula c belongs to a fixed parametrical family. We derive estimators of c 
among all the candidates {cgq,p = 1, . . . sj for each contrast g = 1, 2, oo. Table 
13- Table E] give 

• the parameter 6*"^ for q 

• the parametric family Cp corresponding to the smallest error , 

• the associated relative errors defined by 

RE'^ie'i) = 100 " ""f^^'" 
W'^e'i \ \N',q 

where c is in Cp. 

We test a lot of financial series and we select four revealing examples: we never 
observe that the Clayton family or the Gumbel family contains the best choice 
to model the unknown copula; the families used are always the Student class 
or the Frank class. 

First, we observe that the results are very good since any computed relative 
error is small (in the worst case RE'^ < 20%). The results are quite sim- 
ilar for both thresholding methods when the unknown copula density does 
not present high picks (but the last one: Dow Jones versus FtselOOuk). In a 
theoretical point of view, we prefer the block thresholding method because 
the estimator are smoother. See by instance the case of the copula between 
Brent and ExonMobil where the picks appearing in Figure [3] (on the right) 
are not pleasant (even if their ranges are not so large). Moreover, the relative 
error computed with the parametrical density which is the best one among all 
the possible parametrical copula densities is generally smallest for the block 
thresholding method. 

Notice that the choice of the contrast is crucial to estimate the parameter 
6: there are significative differences between ^1,^2; ^So- This is usual in den- 
sity estimation. We prefer to measure the loss due to the estimator with the 
Li— norm because this norm is very adapted to the human eye and then the 
graphical results are the best. The quadratic loss is frequently used because 
the graphical results are easier to obtain but our opinion is that this norm does 
not emphasize enough differences between the densities. See by instance the 
very small relative errors computed with the L2 contrast. The L^q norm has 
the opposite behavior: it accentuates every difference. It could be a drawback 
when the local thresholding method is considered and when to many details 
are kept (see again the case of the copula of the couple Brent /ExonMobil). 



1, 2, 00 defined by 



arg min arg min Eg{9,p) 

p=l,...,5 \ 9 
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Nevertheless, the choice of the best family do not depend on the choice of 
the contrast: it is fundamental because each type of parametrical family is 
linked to a specific behavior of the dependance and then the practitioner asks 
for indications about the copula type. The study of the copula Cac versus 
Brent allows to decide that both series are independent. Observe that there 
are small problems on the borders (but notice the very small scaling) although 
our methodology is made to remove this artefact. We think that a usual linear 
kernel method becomes disastrous when the copula density comes near of the 
uniform. The copula densities Dow Jones versus Oncedor and Brent versus Ex- 
onMobil are both Frank copulas but with opposite behaviors. It seems natural 
that the series Brent and ExonMobil are dependent and varying in the same 
sense. Oncedor (gold) is an hedge when the stock market collapses which could 
explain the negative dependance between Oncedor and the financial indices 
(we observe the same kind of dependance for others indices like FstelOOuk, 
Cac ...). The more delicate case is for the copula Dow Jones versus FstelOOuk 
because the picks are more accentuated. In this case, the local thresholding 
method produces a nice Figure. 

In conclusion, we present here an estimation method which is popular among 
the practitioners: first, the nonparametrical estimator could be bench- 
mark to decide graphically if the unknown copula density looks like a copula 
density belonging to a well known parametrical family. In this case, the param- 
eter is estimated with plug-in methods using the benchmark. We do not study 
here the properties of such an estimator or the goodness-of-fit test problem. 
For a statistic test procedure, we refer to Gayraud and Tribouley (2008). 



7 Proofs 

We first state the propositions needed to establish the main results. Next, we 
prove Theorem m in two steps by proving both implications. Last, we establish 
Proposition [T] and Corollary 15.11 

From now on we denote K any constant that may change from one line to 
another, which does not depend on j, k and n but which depends on the 
wavelet and on ||c||oo and ||c||2. 

7. 1 Preliminaries 

These preliminary results concern the estimation of the wavelet coefficients 
and the scaling coefficients (denoted cj'j. with eo = (0, . . . , 0) to unify the 
notation). Proposition [3] announces that the accuracy of the estimation is as 
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sharp as if the direct observations were available. 



Proposition 2 Assume that the copula density belongs to Loo o,nd let 5 > 0. 
There exists a constant K > such that for any j such that 2^ < 2 



and for any [k, e) 



log{n) 



^[\cU-cU> Xn)<Kn-'^ (9) 
■ " (10) 



as soon as k, is chosen large enough. 

It is clear that (ITU]) is a direct consequence of (jH])- Proof of ([HD is rejected to 
the Appendix. Note that from we immediately deduce 



Proposition 3 Under the same assumptions as in PropositionlE on j and c, 
there exists a constant K > Q such that for any {k, e) 



E 



^^logH 



n 



7.2 Proof of Theorem^ 



First, we prove the result for the linear estimator. Secondly, we prove the result 
for the local thresholding method. We do not prove the result for the global 
thresholding method since the technics are the same except that the required 
large deviation inequality established in Proposition [2] is given by (ITOl) instead 
of (El). 



7. 2. 1 Proof of Equivalence ^ 

On the one hand, let c be a copula density function belonging to L^o satisfying 
for any n. 



log(n) 
n 



2a+d 



n^L-c\\l<K\::^^] (ii; 
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for some constant K > 0. Let us prove that c also belongs to the space Slg^- 
Let us recall that the smoothing index used for the linear procedure is j* 
satisfying 2^"^" > (n~^ log(n))^''''^*''''^\ Since 



E||5i - cU = E\\rL - E%.^'^.T.^Il2 + II E E4^^.>ll2, 



and following the assumption ( ITT]) , we get 

{cl,r<E\\^L-c\\l<K{2~'^'-y 



E E('-^ 



which is the announced result. On the other hand, let us suppose that c E B. 
Then, using the same technics as in Genest et al. [9j, we prove that 

'log(n) 



2oo- 



E||cl - c\\l < K 



2a 
2a+d 



n 



which ends the proof. The proof in Genest et al. [9] is given in the case d = 2 
and need some sharp control on the estimated coefficients because an optimal 
result is established (there is no logarithmic term in the rate). 



7.2.2 Proof of Equivalence Q (first step: 

When direct observations . . . , Fd^Xf)) are available, we use the esti- 

mator ch~l built in the same way as chl but with the sequence of coefficients 
c^- ;!. defined in ([1]) and the threshold A„/2 instead of A„. Let us take J„ 
positive integers and A„ > 0. Since we get 



E\\chl - cIIo < '2.E\\chl - chlWI + 2^I|chl - c| 



we have then to study the error term due to the fact that we use pseudo 
observations 



21 



T = E\\cHL - chlWI 



■■E 



+ E 



EEHk-cU?H\cU\ > A4i{lC,| > f } 



jn k,e 



+E 



+E 



Jn 



in k,e 



EE(^lkfH\cU\ > Xn}l{\cl,\ < ^} 
_ jn fc,e 

= Ti + T2 + T3 + T4. 
Using Proposition [31 we have 



n 



n 



(12) 



For the study of T2, we separate the cases where the wavelet coefficients are 
larger or smaller than the thresholding level A„/4. By Cauchy-Schwartz In- 
equality, we have 



T2 = E 



Jn ^ ^ .X/ \ \ 

E E^> - -lkf^{\-lk\ > > y} ^ + > 

in k,e 



II/2 



A. 



^EEN,/^-Cfc 
+EE^(C.-5)'iii4.i>T} 

Jn k,t 

Observe that, for any j, k, e, we have 



_ € I ^ An 



.1/2 



\cU'^\^j,k\<2''^\\mto^mL)- (13) 

For any 5 > 0, we use now the standard Bernstein Inequality to obtain 



^(\^k-cU>^)<Kn~' 



(14) 



which is valid for a choice of n large enough. Let us now fix r in ]0, 2[. Applying 
Proposition [3] and using (IT^ . it follows 
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T2<K ^^'^ 

in k,t 



1/2 Jn 



jn k,t 



t)'i:ei{I4.i>t'1) 

jn k,e J y 



for 



Ur. 



XA log(n) 



n 



Similarly, we have 



EE(?.)'iil-^>l ^ > ^} (i{l4,| < ^} + i{l4,| > 

in k,e ^ ' 



J, 



T.T.&H\cu\>^}i{\cu\<^} 



jn k,e 



+E 



Y.Y.^^lkfm,k\<^n}l{\cl,\>^} 

jn k,e 



jn k,e 



4 

r Jn 



for 



< 2 (A'n„, + 4'-A^'- 



implying that 



n" 



J n 



x) EEi(i4.i>^} 



in k,e 



Using (ITSll and Proposition [2], we get 



jn k,e \ ^ / 

Combining the bounds of Ti,T2,T3,T4 and choosing j„, J„ as indicated in 
Theorem [21 we get for 5 > 6 
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E\\cHL - c\\l < 2 EWcJTl - 41 + Kpn 
where 



n \ J \ ^ J k^e ^' ^ n(log(n)) 



2d 



On the one hand, let us suppose that c belongs to the weak Besov VVl(-2^_,_^^ 
which means that (for r := 2d/ {2s + d)) 



\ / j„ k,e 



It follows that 



Pn<K 



'log(n) 



2s 
2s + d 



n 



Using the standard result given in Theorem [3] when direct observations are 



available, we also have 



E\\c]rL-c\\l<K 



'log(n) 



2s 
2s + d 



n 



as soon as c G y^L{2l+d) ^ ^2oo- This ends the proof of the first part of (JTj) of 
Theorem HI 



7.2.3 Proof of Equivalence Q (second step: '^=) 

. _ 2s 

Suppose that there exists M such that for any n, E\\cHL—c\\i < M {n~ log(n)) 2'*+'* . 

SlUCG 

T.Y.i<^lk?<E\\<^L-c\\i 

j>Jn k,e 

and choosing J„ as indicated in Theorem [21 we obtain 

2s 

j>Jn k,t \ ^ / 

Using Definition [T] of the strong Besov spaces, we deduce that c belongs nec- 

ds 

essarily to 82^"^ ■ Let us now study the sum of the square of the details when 
the details coefficients are small 
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EE(4.)^i{I4.I<y} 

i>0 k,e 



E + E + E 

_j<jn j=jn j>Jn 

<Hi + H2 + Hs. 



Y,icUfH\clk\ < An/2} 



Since we have already proved that c G and taking A„ as indicated in 

Theorem m we deduce 



j>J„ fc,e V / 

Taking jn as in Theorem O we get 



^1 < ^ E 2^^^' [yJ - (y)' - ^ (y) 

Observe that 



i/9 = ^ 



EE(c;>)'i{|c;>i < f) {h\cU < An} + iii^j > aj 



Remembering that 



E 



EE(4^)'i{ls>l<An} 

in k,e 



<E\\6^rL-c\\l 



and using Proposition [2] and (fT4l) . we get 



< i?||c~ - c||^ + E E(S>)'P(|c],. - 51 > f ) 

+ EE(4^mic5-4.l>y) 

in 

as soon as 6 is larger than 1. Combining using Definition [2] of the local weak 
Besov space, we conclude that c G WL(r) with r such that 2 — r = 4s/(2s + (i). 
Hence, we end the proof of the indirect direction of ([7]). 
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7. 3 Proofs of Proposition [I] and Corollary I5.il 



The proof of the large inclusion given in Proposition [T] follows immedi- 
ately from the definitions of the functional spaces. Denote the sequence of 
wavelet coefficients of a function c. Since we have 



sup A-2^^(4,)2l{|4,|<A} 

0<A<1 fe_, 

^ sup A^-^EE(4^.)'i{l4.l<A} 

0<A<1 j>o J 



k 

< sup A'-2EE(4^.)'i{E(4^)'<2'^^An 



0<A<1 j>Q 



sup A^X2'^^El{E(4^.)'>2'^^An, 

0<A<1 j>Q e ^ 



it follows from Definition [3] that 

c e WG(r) ^ c e >VL(r). 



To establish the strict inclusions, we build a sparse function belonging to 

ds 

82^"^ n but not to y^ci^^)- Let us choose a real number a such 

that ^ < a < s + |. Let us consider a function c with the sparse sequence 

Cj-^ such that at each level j G N and at each e G S'd, only [22s+'*'^J wavelet 
coefficients take the value {2'^ — i"j~^2~"-'-^^"-^ . The others are equal to 0. For all 

< A < 1, let jx be such that 2^^ = ({2'^ - 1)A)~" . We get 



j:y.h\cu>^}=t.y.h\cu>m 

j>Q k,e j<jx k,e 



implying that 



2da- 2d 



sup A2»+^ > A} < 00, 

0<A<1 



and the function c belongs to the local weak Besov space y^ii.-:^^)- Next, 

put a' = [Aas + 2sd + d'^)/{2(2s + d)) and observe that a' < s + d/2 since 

_ 1 

a < s + d/2. For all < A < 1 let jl be such that = ((2*^ - 1)A) ^. We 
get 
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E 2'^' E 1{E(4;^)' > > (2^^ -1)5: 2'^' 

j>0 t k j<jl 

implying 

sup E 2"^^' E 1{E(4^)' > 2"^^ = oo. 

0<A<1 j>o e k 

It follows that the function c does not belong to the global weak Besov 
y^ai^^^) ^hich ends the proof of Proposition [H Notice that the function 

ds 

c belongs to the strong Besov space 82^"^ because for all (j, e) 



{elk) < 25^^2-2"^ < 2"^^ 

fc,e 



implying that 

sup2^-^^^(4J2 < ^_ 

j>J k,e 

Corollary 15.11 is also proved too. 



8 Appendix 



This section aims at proving ([9]) of Proposition [21 In the sequel we fix the 
indices j and k = {ki, . . . , /c^) and take without loss of generality e = 2*^—1. For 
any i = 1, . . . ,n (the observation index) and any m = 1, . . . d (the coordinate 
index), let us introduce the following notations 



iV,(m) = #{zG{l,...,n};e,(Xr)7^0}, 
as univariate quantities and 



0(X/, . . . , XD = r,A^i{Xl), . . . , F,(Xf )) - r,,kiFiiXl), . . . , F,(Xf)) 
7V, = #{2e{l,...,n};O(Xi,...,Xf)^0} 

as (i— variate quantities. As previously remarked in Genest et al. [9] (for d = 2), 
from the definitions above we have 
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m=l 



mi =1 



i^ik^SF^Axr)) n ^^^^^ 



m=l 



+ E 

mi ,m2=l 
mi^m2 

d 

mi=l 



m=l 
m^mi ,m2 



m=l 



(15) 



In the sequel, for m = 1, . . . , rf, we denote by Tmj{Xi) any term of the type 

r.Mm^l)) X • • • X i^ik^^jF^-uxtn)] X ... X 

i.e. such that there are exactly m factors ^j{X^) appearing in the product. The 

r. Observe that the 



cardinality of such terms Tm,j{Xi) is equal to C^^ = - 
number of terms in f|T5l) is 2^^ — 1. It is fundamental to notice that there is no 
term To,,(X,) = Um=i,...4nkJFUXr))- 



8.1 Technical lemmas 



We begin by giving technical lemmas. 



Lemma 1 There exists a universal constant Kq such that for anym G {1, . . . ,d} 



Wt > 0, P(max |A(Xr)| > t) < Koexp(-2nt'^). 

l<i<n 



Lemma [T] is a consequence of the Dvoreski-Kiefer- Wolfovitz inequality. For the 
interested reader, the detailed proof of this lemma is given in Autin et al. [3]. 

Lemma 2 Let 6 > and n be an integer such that nlog{n) > 2{6~^ VI). 
Then, there exists Ki > such that for any level j satisfying 



2'<l 



2n 



1/2 



3 \51og(n) 



and for any m G {1, . . . ,d}, 



F{Nj{m) > (L + 3)n2"^) V F{Nj > d{L + 3)r22~^) < Ki n 
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The proof of Lemma [2] can be found in Autin et al. [3] 

Lemma 3 Let us assume that c belongs to and let {j,N) e N^. For all 
1 < p < q < d, for all subsets Sp and Sq^p of {1,. . . ,d} with cardinalities 
equal to p and q — p having no common component, let us put for i = 1 . . . ,n, 



Z,{Sp,Sq^p)= l[ ^,,k^iFm{Xr)) n {^P^'\k^,{Fm'{Xr')), (16) 



m£Sr) 



m'eSq-p 



with the following notation = 2^/^ip'{2^ . — k). 

For any fi > 2/^32"-^^/^, we have 



P 



AT 



i=l 



> /i^ < 2 exp [-K2N i^pi^ A ^2i~^'^/2 



where K2iK^ are constants such that 



Lemma [3] is a direct application of the Bernstein Inequahty with 



\EZi(Sp, S, 



Q-PJ 



/ n '^j,km{Um) n c(mi,. 



. , Ud)dui X ... X dud 



m£Sp 



and in the same way, 



Var{Z,{Sp,Sq.p))<{L + lY\\c 



iooii nl^ll'?/' 11^'^ 



and 



\z,{Sp,s,^p)\<\mu\ni^'''^'''' 
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8. 2 Proof of Proposition 



By Equality (IT^ . we have for any A > 



p(i5-c^.,i>A)<EcrL. 



m=l 



for 



1 " 

1=1 



> 



X 



2^ - 1 / ■ 



Using a Tayfor expansion, the folfowing inequahty holds for i = 1, . . . ,n and 
m' = 1 d 



uxf)\ < 2^|A(xr')i(v^«),-,^,(F„,(xr')) +2^"i|A(xr')ni^'ii. 

implying that, for an associated Sd-m 



m'=0 

For m = 1, ... (i, let us introduce the following events 



max \A{Xr)\ 

r?i'=l,...,m 



m+m' 



\Zi{Sd 



Po,m = ] max |A(Xr)| < J i , Pi,™ = {iV,(m) < n, = (L + 3)n2-^ } 

l<i<n V Zn 



po = n 2>o,m > ^1 = n ^1.^ 

m=l 

It follows that for any Sp and any i5q-p 



m=l 



1 " 



4 = 1 



> n Po n Pi^ + P (PS) + P (v^) 



< E P 

m'=0 



n 



J i=l 



>/i + P (PS) + P (DJ) 
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where 



2-i(m+m'/2) 



2n 



m-\-m 
2 



2^||^1|--(L + 3) -U 
(2'^-l)(m+l)G 



Lm/2J • 



Fix K > and take A = wiil£lM_ Using Lemma [T] and Lemma [21 one gets 



F{V^q) V P(PJ) < d(ifo V i^i) n 



(17) 



as soon as 2^ < | (^ij^)^^^ Since fx > 2/^3 2-^('^-'"')/^ we apply Lemma [3] 
and we obtain 



Lm<2Yl exp [-/'S:22"^n (/i^ A /i2^-^(")/2)] + rf(/s:o V Ki) < K n 



m'=0 



as soon as 



5 2-' logHy/' / § 2^i^+d-m')/2iQg^ny 



Let us restrict ourselves to the case where: 



2-' < 



n 



l/d 



logn 



Assuming that n is large enough and that k is chosen large enough, Condition 
(1181) on yU is satisfied if, for any m' = 0, . . .m 



, 2m + m' — 1 2m + c? 
a > V 



m + m' m + m/ + 1 
This condition is always satisfied since d >2. We obtain the announced result. 
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True copule normal 



Estimated copule normal 




Estimated copule normal Estimated copule normal 




Figure 1. Estimation of the normal copula density of parameter 0.5 with n = 2000 
(local thresholding): (a) true copula, (b) estimated copula with symmetrization, (c) 
estimated copula with periodization, (d) estimated copula with zero padding. 
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Figure 2. Brent/Cac: Block Thresh. Method (left) and Local Thresh. Method (right) 
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Brent/Cac: distances between the benchmarks and the parametrical families 



38 




Figure 3. Brent /ExonMobil: Block Thresh. Method (left) and Local Thresh. Method 
(right) 







Oi 


El 


^2 


-E'2 




Eoo 


Gaussian 


Block 


0.15 


0.0396 


0.14 


0.0030 


0.10 


0.1337 


Gaussian 


Local 


0.14 


0.0492 


0.13 


0.0041 


0.10 


0.1437 


Student 


Block 


(0.14,37) 


0.0376 


(0.13,81) 


0.0030 


(0.08,61) 


0.1329 


Student 


Local 


(0.14,95) 


0.0491 


(0.13,95) 


0.0041 


(0.09, 80) 


0.1411 


Clayton 


Block 


0.15 


0.0706 


0.12 


0.0099 


0.05 


0.1879 


Clayton 


Local 


0.14 


0.0799 


0.11 


0.0109 


0.05 


0.1967 


Frank 


Block 


0.76 


0.0301 


0.83 


0.0017 


0.85 


0.0957 


Frank 


Local 


0.75 


0.0393 


0.80 


0.0027 


0.54 


0.1355 


Gumbel 


Block 


1.10 


0.0436 


1.07 


0.0069 


1.02 


0.2309 


Gumbel 


Local 


1.10 


0.0529 


1.06 


0.0076 


1.02 


0.2298 


Ah 


Block 


0.76 


Frank 


0.83 


Frank 


0.85 


Frank 








3.01 % 




0.17 % 




6.61% 


All 


Local 


0.75 


Frank 


0.80 


Frank 


0.54 


Frank 








3.93 % 




0.27 % 




10.64% 



Table 6 



Brent /Exonmobil: distances between the benchmarks and the parametrical families 
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Figure 4. DowJones/Oncedor: Block Thresh. Method (left) and Local Thresh. 
Method (right) 
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DowJones/Oncedor: distances between the benchmarks and the parametrical fam- 
ilies 
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Figure 5. DowJones/FstelOOuk: Block Thresh. Method (left) and Local Thresh. 
Method (right) 
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